Challenges for Implementing FAIR Digital Objects with High Performance Workflows
نویسندگان
چکیده
New types of workflows are being used in science that couple traditional distributed and high-performance computing (HPC) with data-intensive approaches, orchestrate ensembles numerical simulations artificial intelligence (AI) models. Such may use AI models to supplement computation where be too computationally expensive, automate trivial yet time consuming operations, perform preliminary selections among intractable numbers combinations domains as diverse protein binding, fine-grid climate simulations, drug discovery. They offer renewed opportunities for scientific research but exhibit high computational, storage communications requirements [Goble et al. 2020, Al-Saadi 2021, da Silva 2021]. These can orchestrated by workflow management systems (WMS) built upon composable blocks facilitate task placement resource allocation parallel executions on performance [Lee Merzky The communities running these kinds have been slow adopt Findable, Accessible, Interpretable, Re-usable (FAIR) principles, part due the complexity life cycles, numerous WMS, specificity HPC rapidly evolving architectures software stacks, execution modes require managers batch schedulers [Plale FAIR Digital Objects (FDO) encapsulate bit sequences data, metadata, persistent identifiers (PID) help promote adoption FAIR, enable knowledge extraction dissemination, contribute re-use [De Smedt 2020]. As typically data during planning execution, FDOs particularly adapted [Wittenburg But benefits such automating processing actionable DO collections cannot realized without main components rich metadata clear identifiers, universally adopted community. still elusive digital objects. Some added after results produced, not described controlled vocabularies, left unconstrained, resulting inefficient processes loss knowledge. Persistent at publication supporting conclusions, so only a very small amount shared outside community researchers “in know”. In this conceptual work, one distinguish several present both common specific challenges development canonical infrastructure implementation FDO we discuss below: result represent computational obtained when program complete, contain measures from code optimization parallel, heterogeneous architectures, intermediate states checkpointing. All should include environment system specifications which was executed enough re-usability [Pouchard 2019]. Containers often capture dependencies between underlying libraries versions installation [Lofstead 2015, Olaya containers published repositories made available registered resolvers. For instance, attribute Object Identifier github, must additional step registering into Zenodo. extracted context framework including will attribution linking workflow. Computational machine learning predictions form stochastic training non-deterministic Neural networks deep related provenance selection quantities needed an results. What information needs included encapsulating make it re-usable? description method, experiment recommended [Gundersen Kjensmo 2018] instantiated collection. To re-usable, model architecture, platform its version, submission script contains hyperparameters, function, size number epochs Challenges objects containing those size, reduction. Performance scale tends large, thus principled approach is determine counters reproducibility application [Patki variables selected show their impact methods selection: do outliers metrics? thresholds qualify outliers, what overall execution? A key contributor failure important “bolted on” fact piecemeal, cumbersome, manner impedes further analysis. An appropriate level abstraction needed. Capturing automatically take account granularity across layers levels. Intermediate fuse multiple sources stages [Nicolae 2022]. tools already exist. Darshan scalable tool summarizing Input/Output file characteristics [Dai 2019], Radical Cybertools [Merzky 2021] produce graph execution. could they path forward services would guarantee encapsulation DOs favorable re-use.
منابع مشابه
Technical Challenges of Implementing Fair Values in Financial Reporting of Iran: Emphasizing on IFRS13 Requirements
Objective: By the full adoption of IFRS, measurement and disclosure of fair values become more common in Iranian financial reporting. The present study aims to identify the technical challenges of fair value measurement and disclosure in accordance with the proposed framework in IFRS13 as well as technical factors underlying resistance to fair-value based financial reporting from accounting and...
متن کاملAutomatic Performance Engineering Workflows for High Performance Computing
During the typical performance engineering process, application developers must often complete multiple iterative tasks to get an in-depth runtime profile of their application in order to identify new optimization opportunities and implement an effective tuning strategy. However, the majority of today’s mainstream tools do not support common workflows like analyzing the scalability and stabilit...
متن کاملImplementing CEMIS Workflows with State Chart XML
Contemporary CEMIS do not cope with requirements from the sustainability discussion. At the time of reporting environmental performance, it is too late to set the right course. Without an early identification of cause and effect to anticipate environmental impacts of decisions for timely intervention, potentials for acting precautious remain unemployed. A resource-friendly design of processes a...
متن کاملFair Objects
The temporal logic of actions (TLA) provides operators to express liveness requirements in an abstract speciication model. TLA does not, however, provide high level composition mechanisms which are essential for synthesising and analysing complex behaviour. Contrastingly, the object oriented paradigm has proven itself in the development of structured speciications. However, most, if not all, of...
متن کاملA Resilience Approach to High-Performance Workflows
This report presents an approach to design, implement and deploy resilient distributed workflows. It supports the smooth integration of existing software for simulation applications, e.g. Matlab, Scilab, Python, OpenFOAM, Paraview and application programs. The contribution of the report is a new feature which supports resilience, i.e., application-level fault-tolerance and exception-handling. C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Research Ideas and Outcomes
سال: 2022
ISSN: ['2367-7163']
DOI: https://doi.org/10.3897/rio.8.e94835